Search CORE

192 research outputs found

webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser

Author: A Löytynoja
A Löytynoja
A Löytynoja
Ari Löytynoja
B Paten
C Dessimoz
C Kosiol
D Maddison
H McWilliam
J Felsenstein
K Wong
M Hasegawa
Nick Goldman
R Development Core Team
S Whelan
W Fletcher
W Pearson
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites. Results We have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to <it>de novo </it>alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at <url>http://tinyurl.com/webprank</url> . Conclusions The webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Effects of marker type and filtering criteria on QST-FST comparisons

Author: Fraimout A.
Li Z.
Löytynoja A.
Merilä J.
Publication venue
Publication date: 01/11/2019
Field of study

Comparative studies of quantitative and neutral genetic differentiation (QST-FST tests) provide means to detect adaptive population differentiation. However, QST-FST tests can be overly liberal if the markers used deflate FST below its expectation, or overly conservative if methodological biases lead to inflated FST estimates. We investigated how marker type and filtering criteria for marker selection influence QST-FST comparisons through their effects on FST using simulations and empirical data on over 18 000 in silico genotyped microsatellites and 3.8 million single-locus polymorphism (SNP) loci from four populations of nine-spined sticklebacks (Pungitius pungitius). Empirical and simulated data revealed that FST decreased with increasing marker variability, and was generally higher with SNPs than with microsatellites. The estimated baseline FST levels were also sensitive to filtering criteria for SNPs: both minor alleles and linkage disequilibrium (LD) pruning influenced FST estimation, as did marker ascertainment. However, in the case of stickleback data used here where QST is high, the choice of marker type, their genomic location, ascertainment and filtering made little difference to outcomes of QST-FST tests. Nevertheless, we recommend that QST-FST tests using microsatellites should discard the most variable loci, and those using SNPs should pay attention to marker ascertainment and properly account for LD before filtering SNPs. This may be especially important when level of quantitative trait differentiation is low and levels of neutral differentiation high. © 2019 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment

Author: A Löytynoja
A Löytynoja
A Löytynoja
A Löytynoja
A Pang
A Rambaut
A Varadarajan
Botond Sipos
D Tian
DT Gillespie
E Paradis
Gregory E Jordan
H Bengtson
H Philippe
JL Oliver
JL Thorne
JP Huelsenbeck
KP Schliep
LJ Harmon
M Blanchette
M Kimura
MS Rosenberg
N de la Chaux
N Goldman
N Goldman
Nick Goldman
RA Cartwright
S Whelan
TG Clark
Tim Massingham
W Fletcher
Z Yang
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity. Results <monospace>PhyloSim</monospace> is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, <monospace>PhyloSim</monospace> can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing <monospace>PhyloSim</monospace> to be adapted to specific needs. Conclusions Close integration with <monospace>R</monospace> and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that <monospace>PhyloSim</monospace> will be useful to future studies involving simulated alignments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evolutionary Sequence Analysis and Visualization with Wasabi

Author: A Löytynoja
A Löytynoja
A Veidenberg
A Yates
AJ Vilella
B Paten
BR Baum
DR Maddison
DR Zerbino
J Huerta-Cepas
J Zhang
K Katoh
MA Larkin
MN Price
MV Han
S Kumar
YS Cho
Z Yang
Z Yang
Publication venue: Humana press
Publication date: 01/01/2021
Field of study

Wasabi is an open-source, web-based graphical environment for evolutionary sequence analysis and visualization, designed to work with multiple sequence alignments within their phylogenetic context. Its interactive user interface provides convenient access to external data sources and computational tools and is easily extendable with custom tools and pipelines using a plugin system. Wasabi stores intermediate editing and analysis steps as workflow histories and provides direct-access web links to datasets, allowing for reproducible, collaborative research, and easy dissemination of the results. In addition to shared analyses and installation-free usage, the web-based design allows Wasabi to be run as a cross-platform, stand-alone application and makes its integration to other web services straightforward. This chapter gives a detailed description and guidelines for the use of Wasabi's analysis environment. Example use cases will give step-by-step instructions for practical application of the public Wasabi, from quick data visualization to branched analysis pipelines and publishing of results. We end with a brief discussion of advanced usage of Wasabi, including command-line communication, interface extension, offline usage, and integration to local and public web services.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Fast and robust multiple sequence alignment with phylogeny-aware gap placement

Author: A Biegert
A Löytynoja
A Löytynoja
A Löytynoja
A Viterbi
Adam M Szalkowski
AM Altenhoff
AM Szalkowski
B Paten
C Dessimoz
C Grasso
C Lee
D Robinson
DA Dalquen
G Gonnet
GH Gonnet
GH Gonnet
GW Stuart
J Felsenstein
JD Thompson
JD Thompson
JL Thorne
JM Sauder
K Katoh
M Anisimova
M Kimura
O Gascuel
O Gotoh
R Durbin
RC Edgar
S Pascarella
S Whelan
SA Benner
SB Needleman
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

Author: A Heger
A Löytynoja
A Löytynoja
A Siepel
A Siepel
A Siepel
AG Clark
AM Moses
Art F. Y. Poon
B Knudsen
B Paten
B Rannala
Benedict Paten
C Lee
C Strope
DG Higgins
EF Moore
FA Matsen
FR Kschischang
G Lunter
Gerton Lunter
I Holmes
I Miklós
Ian Holmes
J Felsenstein
JD Thompson
JL Thorne
JL Thorne
JS Pedersen
K Katoh
K Liu
KM Wong
KS Pollard
L Gomez-Valero
L Zhu
M Larkin
M Mohri
MA Suchard
N de la Chaux
O Kamneva
O Westesson
Oscar Westesson
P Markova-Raina
R Mills
RA Cartwright
RC Edgar
RK Bradley
RK Bradley
S Nelesen
S Saccone
S Sinha
T Beissbarth
X Qu
Z Wang
Z Yang
Z Yang
Z Yang
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

Vitellogenin Underwent Subfunctionalization to Acquire Caste and Behavioral Specific Expression in the Harvester Ant Pogonomyrmex barbatus

Author: A Bourke
A Dolezal
A Khila
A Li
A Löytynoja
A Stamatakis
A Toth
A Tóth
AFG Bourke
C Holt
C Kent
C Lucas
C Smith
C Smith
C Smith
CM Nelson
CS Moreau
D Bates
D Cardoen
D Gordon
DS Marco Antonio
EO Wilson
G Suen
GE Robinson
GV Amdam
GV Amdam
GV Amdam
H Havukainen
H Lin
J Hancock
J Wang
JD Thompson
Jianzhi Zhang
JT Jackson
K Crailsheim
K Ingram
K Ingram
K Katoh
KJ Livak
Laurent Keller
M Andersson
M Corona
M Hawkins
M Piulachs
M Scharf
MH Haydak
Miguel Corona
N Franks
N Goto
Oksana Riba-Grognuz
P Babin
R Acher
R Bonasio
R Gadagkar
R Page
Romain A. Studer
Romain Libbrecht
S Blank
S Camazine
S Capella-Gutiérrez
S Capella-Gutiérrez
S Cardinal
S Khalil
S Lewis
S Nygaard
T Cremonez
T Fujita
T Junier
T Trenczek
W Engels
W Rutz
Y Ben-Shahar
Y Wurm
Y Wurm
Yannick Wurm
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

PMCID: PMC3744404This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication

CiteSeerX

Crossref

Directory of Open Access Journals

Serveur académique lausannois

PubMed Central

Queen Mary Research Online

FigShare

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

How reliably can we predict the reliability of protein structure predictions?

Author: A Drummond
A Krogh
A Löytynoja
A Löytynoja
B Knudsen
B Redelings
Balázs Dombai
D Gusfield
D Kneller
D Metzler
DF Feng
F Ronquist
G Lunter
G Lunter
H Zhou
I Holmes
I Holmes
I Holmes
I Holmes
I Miklós
István Miklós
J Felsenstein
J Garnier
J Kececioglu
J Skolnick
JL Thorne
JL Thorne
Jotun Hein
K Karplus
K Mizuguchi
K Mizuguchi
L Wang
M Dayhoff
M Suchard
M Waterman
M Waterman
N Goldman
N Metropolis
O Gotoh
P Hogeweg
R Bradley
R Durbin
R Fleissner
S Eddy
S Wu
SB Needleman
T Hubbard
TF Smith
W Hastings
W Press
Ádám Novák '
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results: Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between C α amino acids in superimposed tertiary structures. Conclusion: Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level. </p

CiteSeerX

Crossref

SZTAKI Publication Repository

Springer - Publisher Connector

PubMed Central

Oxford University Research Archive

ELTE Digital Institutional Repository (EDIT)